Automatic indexing of scanned documents: a layout-based approach

نویسندگان

  • Daniel Esser
  • Daniel Schuster
  • Klemens Muthmann
  • Michael Berger
  • Alexander Schill
چکیده

Archiving official written documents such as invoices, reminders and account statements in business and private area gets more and more important. Creating appropriate index entries for document archives like sender’s name, creation date or document number is a tedious manual work. We present a novel approach to handle automatic indexing of documents based on generic positional extraction of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Analysis and indexing of variable-layout documents

In this paper a methodology for analysis and automatic indexing of imaged documents within an archiving and retrieval system is described. This system, which is being developed within the Esprit project STRETCH (STorage and RETrieval by Content of imaged documents), is based on a new generation Archiving and Retrieval Engine (ARE), which overcomes the bottleneck of document profiling by allevia...

متن کامل

Retrieving Images of Scanned Text Documents

Information retrieval is the task of nding documents, usually text, which are relevant to a user's information need. A conventional approach to information management of paper documents is normally based on classifying them into a hierarchical classiication structure. More recently we have seen electronic document management systems which manage scanned images of documents in the same way as pa...

متن کامل

Automatic Indexing for Storage and Retrieval of Line Drawings

The usefulness of a collection of scanned graphical documents can be measured by the facilities available for their retrieval. We present an approach for indexing a collection of line drawings automatically. The indexing is based on the textual and graphical content of the drawings. This approach has been developed to facilitatèretrieval by example' in heterogeneous collections of graphical doc...

متن کامل

A Tool for Arabic Documents Indexing and Retrieval From a Web Virtual Library

This paper presents a method for automatic indexing and retrieval of Arabic documents from a virtual library. This latter can be multilingual and encapsulates several documents written in different languages. All the documents are scanned in order to be stored in the library. The indexing method consists in using the document contents as indexes. They are firstly scanned and then submitted to a...

متن کامل

Local Thresholding Algorithm Based on Variable Window Size Statistics

In an automatic document conversion system, which builds digital documents from scanned articles, there is a need to perform various adjustments before the scanned image is fed to the layout analysis system. This is because the layout detection system is sensitive to errors when the page elements are not properly identified, represented, denoised, etc. Such an adjustment is the detection of for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012